I'm trying to use PCIe gen 3 by switching the card into our newer machine in the lab. Indeed I can see it with lspci -vv
as expected:
05:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450]) Subsystem: Xilinx Corporation Device 0007 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 160 Region 0: Memory at 51100000 (32-bit, non-prefetchable) [size=1M] Region 1: Memory at 51200000 (32-bit, non-prefetchable) [size=64K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee006d8 Data: 0000 Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x4 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range B, TimeoutDis- NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00 Kernel driver in use: xdma Kernel modules: xdma
05:00.0 Serial controller: Xilinx Corporation Device 7024 (prog-if 01 [16450])
Subsystem: Xilinx Corporation Device 0007
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 160
Region 0: Memory at 51100000 (32-bit, non-prefetchable) [size=1M]
Region 1: Memory at 51200000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee006d8 Data: 0000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (ok), Width x4 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis- NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
Kernel driver in use: xdma
Kernel modules: xdma
I moved the driver files from fe01
to the newer desktop.
When trying to run build-install-driver-linux.sh
in /home/pioneer/pcie_testing/XilinxAR65444/Linux
it didn't work initially. the make
call in the script complains:
function ‘mmiowb’ [-Werror=implicit-function-declaration] 921 | mmiowb(); | ^~~~~~ cc1: some warnings being treated as errors
function ‘mmiowb’ [-Werror=implicit-function-declaration]
921 | mmiowb();
| ^~~~~~
cc1: some warnings being treated as errors
I googled the issue and found this forum post:
https://support.xilinx.com/s/question/0D52E00006hpLONSA2/compilation-error-pcie-drivers-for-linux?language=en_US
Which basically just says "comment out all instances of " mmiowb()
. There is exactly one in xdma-core.c, so I commented it out. Then things compiled. Loading the the driver and attempting to run some tests seemed to work:
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./load_driver.sh xdma 61440 0 Loading driver... The Kernel module installed correctly and the xmda devices were recognized. DONE root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./run_test.sh Info: Number of enabled h2c channels = 2 Info: Number of enabled c2h channels = 2 Info: The PCIe DMA core is memory mapped. Info: Running PCIe DMA memory mapped write read test transfer size: 1024 transfer count: 1 Info: Writing to h2c channel 0 at address offset 0. Info: Writing to h2c channel 1 at address offset 1024. Info: Wait for current transactions to complete. sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000000 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x559fc9f48800 CLOCK_MONOTONIC reports 0.000095085 seconds (total) for last transfer of 1024 bytes Transfer speed: 10.27 MB/s sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_h2c_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x55ff1d607800 CLOCK_MONOTONIC reports 0.000060218 seconds (total) for last transfer of 1024 bytes Transfer speed: 16.22 MB/s Info: Writing to h2c channel 0 at address offset 2048. Info: Writing to h2c channel 1 at address offset 3072. Info: Wait for current transactions to complete. sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000800 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_h2c_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x555ae6579800 CLOCK_MONOTONIC reports 0.000055266 seconds (total) for last transfer of 1024 bytes Transfer speed: 17.67 MB/s sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000c00 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_h2c_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x55790b0e3800 CLOCK_MONOTONIC reports 0.000055462 seconds (total) for last transfer of 1024 bytes Transfer speed: 17.61 MB/s Info: Reading from c2h channel 0 at address offset 0. sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000000 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x555ba77dc000 Info: Reading from c2h channel 1 at address offset 1024. Info: Wait for the current transactions to complete. sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_c2h_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x559ca193d000 CLOCK_MONOTONIC reports 0.000066850 seconds (total) for last transfer of 1024 bytes Transfer speed: 14.61 MB/s CLOCK_MONOTONIC reports 0.000048417 seconds (total) for last transfer of 1024 bytes Transfer speed: 20.17 MB/s Info: Reading from c2h channel 0 at address offset 2048. sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000800 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_c2h_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x55654b13d000 Info: Reading from c2h channel 1 at address offset 3072. Info: Wait for the current transactions to complete. sscanf() = 1, value = 0x00000400 sscanf() = 1, value = 0x00000c00 sscanf() = 1, value = 0x00000001 device = /dev/xdma0_c2h_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1 host memory buffer = 0x555b3109f000 CLOCK_MONOTONIC reports 0.000077351 seconds (total) for last transfer of 1024 bytes Transfer speed: 12.63 MB/s CLOCK_MONOTONIC reports 0.000049442 seconds (total) for last transfer of 1024 bytes Transfer speed: 19.75 MB/s Info: Checking data integrity. Info: Data check passed for address range 0 - 1024. Info: Data check passed for address range 1024 - 2048. Info: Data check passed for address range 2048 - 3072. Info: Data check passed for address range 3072 - 4096. Info: All PCIe DMA memory mapped tests passed. Info: All tests in run_tests.sh passed. root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./load_driver.sh
xdma 61440 0
Loading driver...
The Kernel module installed correctly and the xmda devices were recognized.
DONE
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./run_test.sh
Info: Number of enabled h2c channels = 2
Info: Number of enabled c2h channels = 2
Info: The PCIe DMA core is memory mapped.
Info: Running PCIe DMA memory mapped write read test
transfer size: 1024
transfer count: 1
Info: Writing to h2c channel 0 at address offset 0.
Info: Writing to h2c channel 1 at address offset 1024.
Info: Wait for current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000000
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x559fc9f48800
CLOCK_MONOTONIC reports 0.000095085 seconds (total) for last transfer of 1024 bytes
Transfer speed: 10.27 MB/s
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55ff1d607800
CLOCK_MONOTONIC reports 0.000060218 seconds (total) for last transfer of 1024 bytes
Transfer speed: 16.22 MB/s
Info: Writing to h2c channel 0 at address offset 2048.
Info: Writing to h2c channel 1 at address offset 3072.
Info: Wait for current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000800
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555ae6579800
CLOCK_MONOTONIC reports 0.000055266 seconds (total) for last transfer of 1024 bytes
Transfer speed: 17.67 MB/s
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000c00
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_h2c_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55790b0e3800
CLOCK_MONOTONIC reports 0.000055462 seconds (total) for last transfer of 1024 bytes
Transfer speed: 17.61 MB/s
Info: Reading from c2h channel 0 at address offset 0.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000000
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555ba77dc000
Info: Reading from c2h channel 1 at address offset 1024.
Info: Wait for the current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_1, address = 0x00000400, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x559ca193d000
CLOCK_MONOTONIC reports 0.000066850 seconds (total) for last transfer of 1024 bytes
Transfer speed: 14.61 MB/s
CLOCK_MONOTONIC reports 0.000048417 seconds (total) for last transfer of 1024 bytes
Transfer speed: 20.17 MB/s
Info: Reading from c2h channel 0 at address offset 2048.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000800
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_0, address = 0x00000800, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x55654b13d000
Info: Reading from c2h channel 1 at address offset 3072.
Info: Wait for the current transactions to complete.
sscanf() = 1, value = 0x00000400
sscanf() = 1, value = 0x00000c00
sscanf() = 1, value = 0x00000001
device = /dev/xdma0_c2h_1, address = 0x00000c00, size = 0x00000400, offset = 0x00000000, count = 1
host memory buffer = 0x555b3109f000
CLOCK_MONOTONIC reports 0.000077351 seconds (total) for last transfer of 1024 bytes
Transfer speed: 12.63 MB/s
CLOCK_MONOTONIC reports 0.000049442 seconds (total) for last transfer of 1024 bytes
Transfer speed: 19.75 MB/s
Info: Checking data integrity.
Info: Data check passed for address range 0 - 1024.
Info: Data check passed for address range 1024 - 2048.
Info: Data check passed for address range 2048 - 3072.
Info: Data check passed for address range 3072 - 4096.
Info: All PCIe DMA memory mapped tests passed.
Info: All tests in run_tests.sh passed.
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#
Trying to run the speed tests
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_from_device -d /dev/xdma0_c2h_0 -f data/datafile_32M.bin -s 33554432 sscanf() = 1, value = 0x02000000 device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1 host memory buffer = 0x7ff3074ed000 CLOCK_MONOTONIC reports 0.041413173 seconds (total) for last transfer of 33554432 bytes Transfer speed: 772.70 MB/s root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_to_devi ce -d /dev/xdma0_h2c_0 -f data/datafile_32M.bin -s 33554432 sscanf() = 1, value = 0x02000000 device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1 host memory buffer = 0x7f5e9806d400 CLOCK_MONOTONIC reports 0.035558064 seconds (total) for last transfer of 33554432 bytes Transfer speed: 899.94 MB/s root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_from_device -d /dev/xdma0_c2h_0 -f data/datafile_32M.bin -s 33554432
sscanf() = 1, value = 0x02000000
device = /dev/xdma0_c2h_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1
host memory buffer = 0x7ff3074ed000
CLOCK_MONOTONIC reports 0.041413173 seconds (total) for last transfer of 33554432 bytes
Transfer speed: 772.70 MB/s
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests# ./dma_to_devi
ce -d /dev/xdma0_h2c_0 -f data/datafile_32M.bin -s 33554432
sscanf() = 1, value = 0x02000000
device = /dev/xdma0_h2c_0, address = 0x00000000, size = 0x02000000, offset = 0x00000000, count = 1
host memory buffer = 0x7f5e9806d400
CLOCK_MONOTONIC reports 0.035558064 seconds (total) for last transfer of 33554432 bytes
Transfer speed: 899.94 MB/s
root@pioneer-MS-7D41:/home/pioneer/pcie_testing/XilinxAR65444/Linux/Xilinx_Answer_65444_Linux_Files/tests#
Shows that putting the card in the PCIe3.0 slot does not speed it up. This is expected becuase somehow Vivado is hardcoding a limit of 5.0 GT/s.
Upon further investigation, this limit is somehow specified by the board files. I opened another board file (Versal VCK190 Evaluation Platform) and created an XDMA IP block. This had the option for transfer speeds up to PCIE gen 4 (16 GT/s). It seems the speed we see is a limitation of the card.
Maybe we can try to push the HTG-K700 to see if we can push that further.